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Non-centralized recommendation-based decision making is a central feature of several social and technolog- 
ical processes, such as market dynamics, peer-to-peer file-sharing and the web of trust of digital certification. 
We investigate the properties of trust propagation on networks, based on a simple metric of trust transitivity. We 
investigate analytically the percolation properties of trust transitivity in random networks with arbitrary degree 
distribution, and compare with numerical realizations. We find that the existence of a non-zero fraction of abso- 
lute trust (i.e. entirely confident trust) is a requirement for the viability of global trust propagation in large sys- 
tems: The average pair-wise trust is marked by a discontinuous transition at a specific fraction of absolute trust, 
below which it vanishes. Furthermore, we perform an extensive analysis of the Pretty Good Privacy (PGP) web 
of trust, in view of the concepts introduced. We compare different scenarios of trust distribution: community- 
and authority-centered. We find that these scenarios lead to sharply different patterns of trust propagation, due 
to the segregation of authority hubs and densely-connected communities. While the authority-centered scenario 
is more efficient, and leads to higher average trust values, it favours weakly-connected "fringe" nodes, which 
are directly trusted by authorities. The community-centered scheme, on the other hand, favours nodes with 
intermediate degrees, in detriment of the authorities and its "fringe" peers. 

PACS numbers; 



I. INTRODUCTION 



Several social and technological systems rely on the notion 
of trust, or recommendation, where agents must make their de- 
cision based on the trustworthiness of other agents, with which 
they interact. One example are buyers in markets [T, "21, who 
may share among themselves their experiences with different 
sellers, or lenders which may share a belief that a given bor- 
rower will not be able to pay back |3|. Another example 
are peer-to-peer file-sharing programs [2, 4|, which often must 
know, without relying on a central authority, which other pro- 
grams act in a fair manner, and which act selfishly. In the same 
line, an even more direct example is the web of trust of dig- 
ital certification, such as the Pretty Good Privacy (PGP) sys- 
tem L5,^6J, where regular individuals must certify the authen- 
ticity of other individuals with digital signatures. In all these 
systems, the agents lack global information, and must infer the 
reliability of other agents, based solely on the opinion of trusted 
peers, thus forming a network of trust. In this paper, we present 
an analysis of trust propagation based on the notion of transi- 
tivity: If agent a trusts agent b, and agent b trusts agent c, then, 
to some extent, agent a will also trust agent c. Based on this 
simple concept, we define a trust metric with which the relia- 
bility of any reachable agent may be infeiTed. Instead of con- 
centrating on the minutiae of trust propagation semantics, we 
focus on the topological aspect of trust networks, using con- 
cepts from network theory Q. Using random networks as a 
simple model, we investigate the necessary conditions for trust 
to "percolate" through an entire system. We then apply the con- 
cepts introduced to investigate in detail the PGP web of trust, 
possibly the best "real" example of a trust propagation sys- 
tem, which is completely accessible for investigation. We fo- 
cus on the role of the strongly connected nodes in the network 



— the so called trust authorities — which represent a different 
paradigm of trust delegation, in comparison to the decentralized 
community -based approach, which is also heavily present in the 
network. 

This paper is divided as follows. In section III] we define the 



trust metric used; in section III we consider the problem of trust 
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percolation in random networks with different trust weight dis- 
tributions. In in section |IV] we turn to the analysis of the PGP 
network, and provide an extensive analysis of topology of the 
PGP network, and of trust propagation according to different 
trust distribution scenarios. Finally, in section [V] we provide 
some final remarks and a conclusion. 



II. TRUST METRIC 

Trust is the measure of belief that a given entity will act as 
one expects. It is often associated with positive, desirable at- 
tributes, but it may not always be the case (e.g. one may have 
trust that someone will act undesirably). Humans use trust to 
make decisions when more direct information is unavailable. 
In general, humans will decide their level of tiTist based on ar- 
bitrary, heuiistic rules, since there is no formal consensus on 
how to evaluate trust. We will deliberately avoid the detailed 
formalization of these rules, and instead rely on two simplifica- 
tions: 1. We will treat trust simply as a probability that a given 
assessment about an agent is true or false (e.g. fair/reliable or 
not); 2. We further assume that this belief is transitive, i.e. if 
agent a trust agent b, which in turn trusts agent c, then a will 
also trust c, to some extent. This makes trust propagation easier 
to analyse, while retaining the most intuitive properties of trust 
propagation. 

We will consider a system of N agents which form a directed 
trust network: Each agent v (represented by a vertex, or node) 
has a number of interactions (represented by directed edges, or 
links) with other agents {ui} for which a value Cy,ui € [0, 1] of 
direct trust is defined a priori, and which can be interpreted as 
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FIG. 1: Examples of trust networks: Left: A directed tree. Right: 
A more realistic example. The edges in blue are the ones which con- 
tribute to the value of trust from Bob to Alice, according to Eq.|6] 



a probability. This value represents a direct experience agent v 
had with Ui, which is not inferred from any other agent. We then 
define the inferred trust tij e [0, 1] from agent i to any agent 
j, which is somehow based on the values of c„ In a simple 
situation where there is only one possible path between any two 
given nodes (i.e. the network is a directed tree, as the exam- 
ple on the left in Fig. [T}, one could simply multiply the values 
of c along the single path to obtain t, e.g. tAiice.Bob — C1C3, in 
the example of Fig. [T] In general, however, the situation may 
be more complicated, as in the example on the right of Fig. [T] 
where there is a variety of possible (often "contradictory") tran- 
sitive paths between most pairs of nodes. Perhaps the simplest 
way of defining a trust metric would be to consider only the best 
transitivity path between two nodes, i.e., the one where the trust 
transitivity is maximum. 
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(1) 



where Pu-^v is the set of all paths from u to v, {ci} is the set 
of edges in a given path, and Cg is the direct trust associated 
with a given edge. This definition is an attractive one, since 
it corresponds directly to the concept of minimum distance on 
weighted graphs, which is defined as the sum of weights along 
the path with the smallest sum. This is easily seen by noticing 
thatHfeijCei = exp{X;{e,} We J, with = - In Ce. > 
being the edge weights (with the special value of oj^i = 00 
if Cg. = 0). However, it is clear that this approach leads to 
an optimistic bias, since the best path obviously favors large 
values of trust, and uses only a small portion of the informa- 
tion available in the network. As an illustration consider the 
network on the right of Fig. [T| where the value of SAiice.Bob is 
1 X 0.9 X 0.6 = 0.54, via Dave and Chuck. However, if Chuck is 
directly consulted, the transitivity drops to0.3x0.6 = 0.18. In 
principle, there is no reason to prefer any of the two assessments 
over the other. One may attempt to rectify this by considering 
instead all possible paths between two nodes. 
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where uiu-^v is a weight associated with a given path u ^ v.\\. 
should be chosen to minimize the effect of a very large number 
of paths with very low values of trust, without introducing an 



optimistic bias on the final trust value. One apparently good 
choice is to consider the transitivity value of the path itself, but 
not including the last edge. 
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(3) 



where e_>.^ is the last edge in the path, and 5 is the Kronecker 
delta. Not only this avoids a bias in the final value of but 
also LOu~^y has a simple interpretation as being the value of trust 
on the final recommendation, which is completed by the last 
edge. While this may seem reasonable, and uses all available 
information in the network, it has two major drawbacks: 1. It 
is very computationally costly to consider all possible paths be- 
tween two nodes, even in moderately sized networks. It would 
represent an unreasonable effort on part of the agents to use all 
this information. 2. Computed as in Eq.|2j the value of tu,v has 
the unsettling behaviour of tending to zero, whenever the num- 
ber of paths become large (as they often are), even when paths 
are differently weighted. Consider a simple scenario where the 
network is a complete graph, i.e. all possible edges in the net- 
work exist, and all of them have the same direct trust value c. 
Since there are (^~^)/! paths of length / + 1 between any two 
vertices, the value of inferred trust between any two nodes can 
be calculated as 



< c 
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from which it is easy to see that limAr^oo tu.v — for c < 1. 
This is an undesired behavior, since one would wish that such 
highly connected topologies (which often occur as subgraphs of 
social networks, known as cliques) would result in higher val- 
ues of trust. In order to compensate for this one would have to 
use a more aggressive weighting of the possible paths. We pro- 
pose the following modification, which combines some features 
of both previous approaches: Instead of considering all possi- 
ble paths, we consider only those with the largest weights to all 
the in-neighbours of the target vertex, as shown in Fig.|2] This 
leads to a trust metric defined as 



(6) 



where the path weights are the best trust transitivity to the in- 
neighbours, Su)li"\ which are calculated after removing the tar- 
get vertex from the graph (so that it cannot influence its own 
trust). We call this trust metric pervasive trust, and it corre- 
sponds to the intuitive strategy of searching for the nodes with 
a direct interaction with the target node (the final arbitrators), 
and weighting their opinions according to the best possible trust 
transitivity leading to them. It can be seen that this definition 
does not suffer form the same problems of Eq. |2] again by con- 
sidering the same complete graph example, with uniform direct 
trust c. Since in this situation every target vertex has N — 2 
in-neighbours different from the source, and the shortest path to 
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FIG. 2: Illustration of the paths used to calculate tu,v according to 
Eq.[6] The vertices Wi are the in-neighbours of v, and the values Si = 
Su^^ are the values of best trust (Eq.[I| from u to Wi, with vertex v 
removed from the graph. 



each of these in-neighbours is of length one, the value of perva- 
sive trust can be easily calculated as 



{N - 2)c^ + c 
(Ar-2)c+l ' 



(7) 



which converges to i„^u 



for ^ 1. Thus the indirect 



opinions with value dominate the direct trust value c, but the 
inferred value does not vanish, as with the definition of Eq. |2] 
Considering again the example on the right of Fig. [T[ we obtain 
the value ^Aiice.Bob = (0.9^ x 0.6 + (0.9 x O.?)^ x 0.3)/(0.9 + 
0.9 X 0.7) w 0.4, from the edges outlined in blue in the fig- 
ure. Additionally, the definition of pervasive trust works as one 
would expect in the trivial example on the left of Fig. [T| where 
Su,v and tu.v have the same values. 

We note that the numerical computation of „ can be done 
by using Dijkstra's shortest path algorithm |[8] |9|, which has 
a complexity of 0{N log N). Thus the entire matrix s„ „ can 
be calculated in 0{N^ log A^) time. The same algorithm can 
be used to calculate t„ „, but since each target vertex needs to 
be removed from the graph, and thus a new search needs to be 
made for each different target, this results in 0{N^ log N) time. 
It is possible to improve this by performing searches in the re- 
versed graph, i.e., for each target vertex v, the contribution to 
tu,v from all sources u can be calculated simultaneously, after 
V is removed, by performing a single reversed search from each 
of the in-neighbours of v to each source u. This way, the en- 
tire tu.v matrix can be computed in 0{kN^ log N) time (where 
k — E/N is the average degree of the network), which is com- 
parable to the computation time of Su,v for sparse graphs. 



A. Comparison with otlier trust metrics 

Other trust metrics have been proposed in the literature, 
mainly by computer scientists, seeking to formalize the notion 
of trust in peer-to-peer computer systems. Some are quite de- 
tailed, like the usage of subjective logic by J0sang et al IflOl . 
and others are comparable with the simplistic approach taken 
in this work, such as Eigentrust yj and more recently Trust- 
WebRank 1 11 1 . These last metrics are based on the notion of 
feedback centrality \9\, which are calculated by solving some 
linear system. The Eigentrust metric requires the trust network 
to be a stochastic matrix (i.e. the sum of the trust values of 
the out-edges of all vertices must sum to unity) and the in- 
feiTed trust values are given by the steady state distribution of 



the corresponding Markov chain (i.e. the left eigenvector of the 
stochastic matrix with unity eigenvalue, hence the name of the 
metric). Thus the inferred trust values are global properties, in- 
dependent of any source vertex (i.e. non-personalized), which 
is non-intuitive. Additionally, the requirement that the trust net- 
work is stochastic means that only relative values of trust are 
measured, and the absolute information is lost. Furthermore, 
such an approach is strongly affected by the presence of loops 
in the network, which get counted multiple times, which is also 
non-intuitive as far as trust transitivity is concerned. The met- 
ric TrustWebRank [HJ tries to fix some of these problems by 
borrowing ideas from the PageRank jl2j algorithm, resulting 
in a metric which also requires a stochastic matrix, but is per- 
sonalised. However, in order for the algorithm to converge, it 
depends on the introduction of an damping factor which elimi- 
nates the contribution of longer paths in the network, indepen- 
dently of its trust value. This is an a priori assumption that these 
paths are not relevant, and may not correspond to reality. Addi- 
tionally, the strange role of loops in the network is the same as 
in the Eigentrust metric. However, since there is no consensus 
on how a trust propagates, and the notion of trust lacks a formal, 
universally accepted definition, in the end there is no "correct" 
or "wrong" metric. We only emphasize that our approach is de- 
rived directly from the simple notion of trust transitivity, is easy 
to interpret, propagates absolute values of trust, and makes no 
assumption whatsoever about the network topology, and direct 
trust distribution. 



III. TRUST PERCOLATION 

Trust transitivity is based on the multiplication of direct trust 
values, which may tend to be low if the paths become long. 
Therefore, it is a central problem to determine if the trust transi- 
tivity between two randomly chosen vertices of a large network 
vanishes if the system becomes very large. This provides impor- 
tant information about the viability of trust transitivity on large 
systems. As a simple network model, we will consider random 
directed networks with arbitrary degree distributions IJ3J . We 
will also suppose that the direct trust values in the range be- 
tween c and c + dc will be independently distributed with prob- 
ability pc{c)dc, where Pc{c) is an arbitrary distribution. The 
objective of this section is to calculate the average best trust 
transitivity (s), given by Eq. [T] and the average pervasive trust 
{t), Eq. [6] between randomly chosen pairs of source and target 
vertices. In random networks, the value of average pervasive 
trust will be given simply as (t) — (s) (c), since the best paths 
to the in-neighbours of a given vertex are uncoiTelated, and the 
probability that they pass through the node itself tend to zero, 
in the limit of large network size. Therefore we need only to 
concern ourselves with the average best trust transitivity (s). 

Networks are composed of components of different types and 
sizes: For each vertex there will be an out-component, which 
is the set of vertices reachable from it, and an in-component, 
which is the set of vertices for which it is reachable. A max- 
imal set of vertices which are mutually reachable is called a 
strongly connected component. Random graphs often display a 
phase transition in the size and number of these components: If 
the number of edges is large enough, there will be the sudden 
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formation of a giant (in-, out-, strongly connected) component, 
which spans a non-vanishing fraction of the network 1171 ITSl . 
The existence of these giant components is obviously necessary 
for a non-vanishing value of trust to exist between most ver- 
tices, but it is not sufficient, since it is still necessary that the 
multiplication of direct trust values along most shortest paths 
do not become vanishingly small. As an illustration, consider 
a sparse graph (i.e. with finite average degree), with an arbi- 
trary degree distribution. In the situation where there is a suf- 
ficiently large giant out-component in the graph, the average 
shortest path from a randomly chosen root vertex to the rest of 
the network is given approximately ifTSl by 

HN/{k)) 
ln((fc2)/(fc))' 



p+(s) = Jq dupf{u), this self-consistency can be expressed as 



I 



(8) 



independently of the out-degree distribution (as long as (k) and 
(A;2) are finite positive), where N is the number of vertices, 
(k) is the average out-degree and (fc2) is the average number 
of second out-neighbours, and it is assumed that N ^ (fc) 
and (/j2) ^ (k) ll24l . Since the edges are weighted, the av- 
erage length of the best paths can differ from I, but can never 
be smaller. Thus, an upper bound on the average best trust 
is given by (s) = o{max{ci} ), where maxjci} is the max- 
imum value of direct trust in the network. In the situation 
where max{ci} < 1, we have that limAr_j.oo (s) = o(0), since 



lim 



N- 



, / = oo. Therefore, if there are no values of c 



in the network, the average trust will always be zero in sparse 
networks. The only possible strategies for non-vanishing values 
of average trust is either to have a non-zero fraction of c = 1 
(which we will call absolute trust), or for the network to be 
dense, such that I remains finite for N oo. 

With the above consideration in mind, we now move to 
calculate the average trust transitivity values. For that we 
modify the generating function method used in |13 | to obtain 
the distribution of component sizes. The objective is to obtain 
a self-consistency condition for the distribution of best trust 
transitivity values by describing the direct neighbourhood 
of a single vertex, which is based on the following observa- 
tion: A randomly chosen vertex u with out-neighbours Wi, 
each with direct trust from u given by will trust another 
randomly chosen vertex v with a best trust transitivity value 
of s+ only if max(ciS^) — s+, where is the best trust 
transitivity from Wi to v. In a random network which is 
sufficiently large, s+ and sj' should both be drawn from 
the same distribution pf{s). This gives a self-consistency 
condition for pf{s) which is given schematically as follows, 

f 




inax{s^c.i} = s 



where each term corresponds to the probability of the vertex 
having a given number of out-neighbours, and the maximum 
best trust transitivity being equal the desired value. Note 
that we have explicitly multiplied every instance of s+ with 
a arbitrary free variable m+, which cannot be determined by 
the above self-consistency alone, and has to be described 
separately. Each term on the right is weighted by the out- 
degree probability pi.. In terms of the cumulative distribution 



(9) 



where /3+(a;) is the cumulative probability that s+c < x, with 
c distributed by Pc{c), given by 



dxpc{x)pg{s/x) 



(10) 



The distribution pf{s) above does not equal Ps{s), due to the 
remaining variable for which one must still find an appro- 
priate distribution. This last piece is obtained by realizing that 
psis) must also be subject to a complementary self-consistency 
condition in the opposite direction, following the in-neighbours: 
A randomly chosen vertex u with in-neighbours Wi, each with 
direct trust to u given by Ci, will be trusted by another ran- 
domly chosen vertex u with a best trust value of only if 
max(ciS~) = s^, where is the best trust from Wi to u. This 
results in an entirely analogous self-consistency condition for 
pj(s), where the out-degree distribution pk is replaced by the 
in-degree distribution pj. Since this last self-consistency is also 
complete up to a free variable u~, we can formulate the ansatz 
that i<+ — and = s+, such that 



(11) 



With this connection it is possible to obtain Ps{s) as 



Ps{s) = / dupj{u)p^ is/u)/u, or (12) 

r-l 

dup+{s/u)pj{u)/u, (13) 

and the average (s) more directly as 

{s)= [ [ ds'ds+s-s+p-{s-)pt{s+) (14) 
^0 Jo 

(15) 



By rewriting Eq. [9]in terms of the generating functions of the 
in- and out-degree distributions, 

G{z) = j2p^^' ^(^) = E^^^^"' (16) 



one obtains the self-consistency equations in a more compact 
form, 



p;(s) = F(/3-(s)) 
ptis) = Gi/5+is)). 



(17) 
(18) 



These are integral equations, for which there are probably no 
general closed form solutions. However, it is possible to solve 
them numerically by successive iterations from an initial dis- 
tribution, which we chose as ff{s) = 8(s — 1), where Q{x) 
is the Heaviside step function. From the numerical solutions 
the average values can be obtained as (s^) — dspj{s)s — 
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1 — /g dspg (s) (where the last expression is obtained by inte- 
gration by parts), and in analogous fashion for (s+). The aver 
age value of best trust transitivity (s) is then given by Eq. 15 



We turn now to the conditions necessary for non-vanishing 



average trust transitivity. Both Eqs. 17 and 18 accept the trivial 
solution ps ^^(s) — B(s), which corresponds to ps ^^(s) — 
5{s), i.e. the average best trust is zero. As discussed previously, 
for other solutions to be possible, we need to consider a non- 
vanishing fraction of edges with absolute trust c = 1 in the 
network. Here we will consider direct trust distributions of the 
form. 



p,(c)=75(c-l) + (l-7)p'c(c), 



(19) 



which correspond to a fraction 7 of edges with c = 1, and a 
complementary fraction (1 — 7) with c given with probability 
density p'^{c). We will consider two different versions of Pc(c): 
A uniform distribution p'^{c) = 1, and a single-valued distribu- 
tion — 5{c — rf), with 77 = 1/2. We will use two different 
degree distributions, the Poisson and Zipf [25 J . and their respec- 
tive generating functions. 



or 



-(0) 



G{z) = e<^>(^-i) 



Pj 
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G{z) = 



Lir(^) 

C(r) ' 



(20) 
(21) 



where C,{t) is the Riemann C, function, and Li„(a;) is the nth 
polylogarithm of x. For simplicity, we will consider only the 
situation where pj — pk, and both j and k are independently 
distributed. In Fig. [3] are plotted the values of (s) and (t), as a 
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FIG. 3: Average values of best trust (s) and pervasive trust (t) as 
a function of the fraction of edges with absolute trust 7. Top left: 
Networks with Poisson in- and out-degree distributions, and uniform 
trust distribution. Top right and bottom right: Poisson distribution, 
and single- valued trust distribution. Bottom left: Zipf distribution, and 
single-valued trust distribution. Solid lines correspond to analytical 
solutions, and symbols to numerical realizations of several networks 
of different sizes: 10'' (red), 10^ (green) and 10*^ (blue) nodes. The 
dashed line shows the average direct trust (c) = (7 + l)/2. 

function of 7, for the different distributions. It is also compared 



with numerical computations on actual network realizations of 
different sizes. The main feature observed is a first-order tran- 
sition from vanishing trust to positive trust, at specific values 
of 7. The transition values 7* correspond exactly to the crit- 
ical values of the formation of a giant component of the in- 
duced subgraph composed only of edges with c = 1, which 
has average degree 7 (k) ST3\ . For graphs with Poisson degree 
distribution, this corresponds to 7* = 1/ (fc). It is worth ob- 
serving that on finite graphs, the average trust does not vanish 
very rapidly, and is still non-zero for relatively large networks 
with N = 10^ nodes, even when 7 = 0. This is attributed to 
the so-called small-world effect where the average shortest path 
scales slowly as Z ^ In N, as in Eq. |8] Therefore in practical 
situations where networks are large but finite, 7 > 7* it is not a 
strictly necessary condition for system-wide trust propagation. 
Another interesting feature is the behaviour of the average trust 
in graphs with Zipf degree distribution. There, the transition to 
positive trust is of second order, and the critical points are also 
J — 1/ (fc). Additionally, the values of average trust are smaller 
than in networks with Poisson degree distribution and the same 
average degree, for intermediary values of 7 after the transi- 
tions. This is due to the smaller path multiplicity of graphs with 
scale-free distribution: Even though the average shortest path 
length is smaller in such graphs, the number of alternative paths 
is also smaller, due to the dominance of vertices with smaller 
degree. Thus, if the shortest path happens to have a small trust 
value, there will be a higher probability there will not be an al- 
ternative path. In Fig.|3]it is shown also the average best trust 
for 1 < T < 2, for which the average degree diverges. For such 
dense networks, the values of (s) are above zero for all values of 
7 > 0, which is simply due to the fact that the average shortest 
path length does not diverge in this case. 



IV. THE PRETTY GOOD PRIVACY (PGP) NETWORK 

In this section we investigate trust propagation on the Pretty 
Good Privacy (PGP) network. In a broad manner PGP (or more 
precisely the OpenPGP standard 1 14 1) refers to a family of com- 
puter programs for encryption and decryption of files, as well 
as data authentication, i.e. generation and verification of dig- 
ital signatures. It is often used to sign, encrypt and decrypt 
email. It implements a scheme of public-key cryptography [: 15 1, 
where the keys used for encryption/decryption are split in two 
parts, one private and one public. Both parts are related in way, 
such that the private key is used exclusively for decryption and 
creation of signatures, and the public key only for encryption 
and signature verification. Thus any user is capable of send- 
ing encrypted messages and verifying the signature of a spe- 
cific user with her public key, but only this user can decrypt 
these messages and generate signatures, using her private key, 
which she should never disclose. The public keys are usually 
published in so-called key servers, which mutually synchronize 
their databases, and thus become global non-centralized repos- 
itories of public keys. However, the mere existence of public 
key in a key server, associated with a given identity (usually a 
name and an email address) is no guarantee that this key really 
belongs to the respective person, since there is no inherent ver- 
ification in the submission process. This problem is solved by 
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the implementation of the so-called web of trust of PGP keys, 
whereby a user can attach a signature to the public key of an- 
other user, indicating she trusts that this key belongs to its al- 
leged owner. The validity of a given key can then be inferred 
by transitivity, in a self-organized manner, without the required 
presence of a central trust authority. As such, this system repre- 
sents an almost perfect example of a trust propagation through 
transitivity. 

As a rule, key signatures should only be made after careful 
verification, which usually requires the two parties to physically 
meet. Such a requirement transforms the web of trust into a 
snapshot of a global social network of acquaintances, since the 
vast majority of keys correspond to human users, which tend to 
sign keys of people with which they normally interact. There 
is also a tendency to sign keys (upon verification) from people 
which do not belong to a close circle of acquaintances, with 
the sole purpose of strengthening the web of trust with more 
connections. This tendency is well reflected by the so-called 
"key signing parties", where participants meet (usually after a 
large technological conference) to massively sign each other's 
keys fl6l . Thus the structure of the PGP network reflects the 
global dynamics of self-organization of human peers in a social 
context. 

This section is divided in two parts. In the first part we 
present some aspects of the topology and temporal organization 
of the network. In the second part we analyze the trust transi- 
tivity in the network, in view of the trust metric we discussed 
previously. 



A. Network topology 

The PGP network used in this work was obtained from a 
snapshot of the globally synchronized SKS key servers |26 | in 
November 2009. It is composed of N ^ 2.5 x 10^ keys and 
_E w 7 X 10^ signatures with a very low average degree of 
(j) = 0.28. This means that many keys are isolated and con- 
tain no signatures. Therefore we will concentrate on the largest 
strongly connected component, i.e. a maximal set of vertices 
for which there is a path between any pair of vertices in the 
set. The number of vertices N ^ A y. 10** in this component is 
much smaller, but the network is much denser, with on average 
(j) ~ 7.58 signatures per key (see summarized data in table[l|l. 
It represents the de facto web of trust, since the rest of the net- 
work is so sparsely connected that no trust transitivity can be 
inferred from it. We note that keys may have multiple "sub- 
keys" which correspond to different identities (usually different 
email addresses from the same person) and which can individ- 
ually sign other subkeys. For simplicity, in this work we have 
collapsed subkeys into single keys, and possible multiple sig- 
natures into a single signature. We have also discarded invalid, 
and revoked keys and signatures. 

The number of keys and signatures in the strongly connected 
component has been increasing over time, as shown in Fig. |4] 
The number of keys (which are now valid) was approximately 
the same for some time and then slightly decreased for a period 
up to around 2002, and has been increasing with an approxi- 
mately constant rate since then. We note that the number of 
keys may decrease since keys can expire or be revoked. The 
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E 



2513677 703142 
39796 301498 



;0.28 0.45 -0.02152(12) 0.02321(9) 
; 7.58 0.69 0.0332(3) 0.461(2) 



TABLE I: Summary of statistics for the whole PGP network (above) 
and the largest strongly connected component (below). A'^ is the num- 
ber of vertices (keys), and E is the number of edges (signatures), {j) is 
the average in-degree, r is the average reciprocity, a is the assortativity 
coefficient and c is the average clustering coefficient. 



number of signatures, on the other hand, seems to be increas- 
ing with an accelerated rate, which is approximately constant, 
and similar to the rate of growth of the number of keys. This 
means that the average degree of the network is increasing with 
time, as can be seen in Fig. |4] Keys and signatures grow in 
an organized manner, as shown by the waiting time distribution 
between the creation of two subsequent keys or signatures, as 
shown in Fig.|4] These distributions are broad for several orders 
of magnitude, from the order of seconds to days, approximately 
following a power-law in this region. The fact that keys and 
signatures are often created only seconds apart, and the wait- 
ing time distribution lacks any discernible characteristic scale, 
except for a cut-off at large times (^ 1 day), shows that the net- 
work does not grow in a purely random fashion (which would 
generate exponentially-distributed waiting times), and serves as 
a signature of an underlying organized growth process. 




Time (year) 




10-5 10-'' 10-3 10-2 10-1 10° 10' 
A< (days) 

FIG. 4: Number of keys and signatures as a function of time for the 
strongly connected component of the PGP network, and waiting time 
distribution between new keys and signatures. The straight lines are 
power-laws lS.t~^, with ^ = 1.3 (top) and ^ = 0.18 (bottom). 

We will characterize the topology of the network by its de- 
gree distribution and nearest-neighbours degree correlations, as 
well as other standard network measures such as clustering 1 17 ], 
reciprocity ifTSl and community structure llT9l . We will pay 
special attention to the most highly connected vertices, some 
of which correspond to so-called certificate authorities and dis- 



7 



play a distinct connectivity pattern, which has a special meaning 
for trust propagation. 

The network has very heterogeneous degree distributions, as 
can be seen in Fig. [5] with some keys having on the order of 
10'^ signatures. They are possibly compatible with a power-law 
with exponent ^ 2.5 for large degrees, but the distributions are 
not broad enough for a precise identification. The number of 
signatures on a given key (the in-degree) and the number of sig- 
natures made by a the same key (the out-degree) are strongly 
correlated, as can be seen in Fig. [6j which shows the average 
out-degree (k) as a function of the in-degree j. This is explained 




FIG. 6: Left: Average out-degree as a function of the in-degree of the 
same vertex. Right: Average edge reciprocity, as a function of the in 
or out-degree of the source vertex. 




10" IQi 10^ 10-' 
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J or k 



10' 
10-1 
I 10-2 

I 10-' 

OJ 

a. 10-' 
10-' 
10-'' 





■ Original 
▼ Shuffled 1 







10" 10' 10^ lO'' 

Community size 



FIG. 5: Several statistical properties for the PGP Network. Top left: 
In- and out-degree distributions, pj and pk respectively. The solid line 
corresponds to a power-law with exponent 2.5. Top right: Average in- 
and out-degree of the nearest out-neighbours, as a function of the in- 
and out-degree. Bottom left: Average lustering coefficient as a func- 
tion of in- and out-degree. Bottom right: Distribution of community 
sizes, for the unmodified and shuffled versions of the network. The 
solid lines correspond to power-laws with exponent 2.3 (top) and 3.8 
(bottom). 

by the high reciprocity of the edges in the network, i.e. if a key 
a signs a key b, there is a very high probability that key b signs 
key a as well. This is easy to understand, since key verifica- 
tion usually requires physical presence, and both parties take 
the opportunity to mutually verify each other keys in the same 
encounter The edge reciprocity [TSl is quantified as the frac- 
tion r — n'^ I E, where is the number of reciprocal edges 
and E is the total number of edges in the network. The PGP 
network has a high value of r = 0.69. The reciprocity is dis- 
tributed in a slightly heterogeneous fashion across the network, 
as is shown in Fig. [6] where is plotted the average reciprocity of 
the edges as a function of the in- and out-degrees of the source 
vertex. It can be seen that the keys with very few signatures 
tend to act in a very reciprocal manner, whereas the more pro- 
lific signers receive less signatures back. This heterogeneity is 
further amplified when one considers the degree correlation be- 
tween nearest-neighbours, as shown in Fig. [5] where it is plotted 
the average in- and out-degree, {j)^^^ and (k)^^, of the nearest 
out-neighbours of the vertices in the network, as a function of 
the in- and out-degree of the source vertex, j and k. The de- 
gree correlation shows an assortative regime for intermediary 



degree values (^ 10 - 40), meaning that vertices with higher 
degrees are connected preferentially with other vertices with 
high degree, but also some dissortative features for vertices with 
very high and very low degrees, where vertices with low degree 
are connected preferentially with vertices with high degree, and 
vice versa. This mixed connectivity pattern leads to a very low 
scalar assortativity coefficient I.27J of a = 0.0332(3), which 
is unusual for social networks ll20l . These differences become 
more clear when one investigates more closely the keys with 
the largest degree in the network, as it is shown in table |ll] As 
with the rest of the network, most of the largest keys belong to 
individuals, with the exception of the first and third keys with 
the most signatures, which belong to entities. These entities are 
known as certificate authorities and are created by organiza- 
tions with the intent of centralizing certification. The largest au- 
thority is the community-driven CAcert.org which issues digital 
certificates of various kinds to the public, free of charge |28|. 
The second largest authority is the German magazine c't, which 
initiated a PGP certification campaign in 1997 ||29l . These au- 
thorities interact with individuals in a different manner, acting 
as a central mediator between loosely connected peers. This 
is evident by the low clustering coefficient (c ~ 0.003), which 
is one order of magnitude lower than the other (human) hubs 
(c 0.05 - 0.11), and the average in-degree of their out- 
neighbours, which is also significantly smaller than their human 
counterparts (^ 17 vs. 60 - 80, respectively). These differ- 
ent patterns represent distinct paradigms of trust organization: 
Authority vs. Community-based; each with its set of advan- 
tages and disadvantages. An authority-based scenario relies on 
few universally trusted vertices which mediate all trust prop- 
agation. In this way, the responsibility of key verification is 
concentrated heavily on these vertices, which reduces the total 
amount of verification necessary, and is thus more efficient. The 
most obvious disadvantage is that the authorities represent cen- 
tral points of failure: if an authority itself is not trusted, neither 
will be the keys it certifies. Additionally, this approach may in- 
crease the probability of forgery, since only one party needs to 
be deceived in order for global trust to be achieved. The com- 
plementary scenario is the community-based approach, where 
densely-connected clusters of vertices provide certification for 
each other. This obviously requires more diligence from the 
participants, but has the advantage of larger resilience against 
errors, since the multiplicity of different paths between vertices 
is much larger In the PGP network both these paradigms seem 
to be present simultaneously, as can be observed in detail by ex- 
tracting its community structure |fT9]| . This is done by obtaining 
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Key ID 


Name 


j 


k 




c 


Date 


D2BB0D01 65D0FD58 


CA Cert Signing Authority (Root CA) <gpg@cacert.org> 


965 


1507 


17.5(8) 


0.0031 


2003-07-11 


2F951508AAE6022E 


Karlheinz Geyer (TUD) <geyerk.fv.tti@nds.tu-darmstadt.de> 


661 


744 


59(2) 


0.0660 


2004-12-07 


DBD2 4 5FCB3B2A12C 


ct magazine CERTIFICATE <pgpCA@ct.lieise.de> 


597 


1348 


18.3(12) 


0.0033 


1999-05-11 


6 9D2A61DE2 63FCD4 


Kurt Gramlicli <kurt@skolelinux.de> 


406 


644 


71(3)) 


0.0807 


2002-10-17 


94 8FD6A0E10F502E 


Marcus Frings <protagonist@gmx.net> 


387 


381 


82(5) 


0.1110 


2002-03-22 


2 9BE5D22 58FD54 9F 


Martin Michlmayr <tbm@cyrius.com> 


385 


436 


56(4) 


0.0499 


1999-08-04 


556D3 62CEE0 97 7E8 


Jens Kubieziel <jens@kubieziel.do 


369 


414 


73f4'l 


0.1098 


2002-08-23 


3F101691D98502C5 


Elmar Hoffmann <elho@elho.net> 


352 


1 


348 


0.1122 


2005-02-17 


957952D7CF3401A9 


Elmar Hoffmann <elho@elho.net> 


348 


311 


84(5) 


0.1086 


2005-02-17 


CE8A79D798016DC7 


Josef Spillner <josef @coolprojects.org> 


344 


429 


71(4) 


0.1007 


2001-05-22 


89CD4B21607559E6 


Benjamin Hill (Mako) <mako@atdot.cc> 


325 


319 


70(5) 


0.0801 


2000-07-13 



TABLE II: The eleven keys with the largest number of signatures in the network, their respective in-degree i, out-degree j, average in-degree of 
the nearest out-neighbours (j)^^,, clustering coefficient c, and date of creation. 



the community partition of the network which maximizes the 
modularity Q of the network, defined as 



2E ^ 



A, 



2E 



(22) 



where E is the total number of edges, Aij is the adjacency ma- 
trix of the network, ki is the degree of vertex i, Si is the commu- 
nity label of vertex i and S is the Kronecker delta. According to 
this definition, a partition with high values of Q is possible for 
networks with densely-connected groups of vertices, with fewer 
connections between different groups. The maximum value of 
Q = 1 is achieved only for "perfect" partitions of extremely 
segregated communities. We note that the above definition is 
meaningful only for undirected graphs, and thus we apply it to 
the undirected version of PGP network, where the direction of 
the edges is ignored. We used the method of Reichardt et al ||2T|| 
to obtain the best partition, which resulted in modularity value 
of Q « 0.73. As a comparison, we computed the modularity 
for a shuffled version of the network, where the edges were ran- 
domly placed, but the degrees of the vertices were preserved, 
which resulted in the significantly smaller value Q w 0.03. 
The distribution of community sizes seems to have a power- 
law tail with exponent ^ 2.3 (^ 3.8 for the shuffled network), 
characterizing a scale-free structure. By isolating the individual 
communities, one can clearly see strong differences between 
those in the vicinity of the certificate authorities and "regular" 
communities. In Fig. [7] is shown two representative examples 
of these two types of communities: On top is the community 
around the CAcert.org certificate authority, and is composed of 
677 keys, with an average 6.9 signatures per key. Its degree 
distributions are shown on the side, from which the large dis- 
crepancy between the most central vertex and the rest of the 
community can be observed. The colors on the vertices corre- 
spond to the Top-Level Domain (TLD) of the email addresses 
associated with each key, and serve as an indication of the geo- 
graphical proximity of the individuals. For the community con- 
taining CAcert.org, a high degree of geographical heterogene- 
ity is present. This is corroborated also by the fact that there 
are fewer direct edges between individuals. On the bottom of 



Fig. [T] it is shown a community composed almost exclusively 
of keys with Austrian email addresses (.at TLD) which show a 
completely different pattern, lacking any central authority. It is 
smaller, with 287 keys, but denser, with 10 signatures per key. 
This pattern is repeated for most of the largest communities in 
the graph. Some non-centralized communities have a broader 
degree distribution than the Austrian community, but only those 
associated with certificate authorities display a centralized pat- 
tern such as in the top of Fig. [7] 

We now turn to the trust propagation on the PGP network. 




FIG. 7: Two example communities of the PGP network, and their in- 
and out-degree distributions. The colors on the vertices correspond to 
the top-level domain (TLD) of the email addresses. Top: Community 
containing the CACert.org certificate authority. Bottom: Community 
composed mostly of Austrian email addresses (.at TLD). 
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B. Trust transitivity 

In order to properly investigate trust transitivity in the PGP 
network, it is necessary to know the direct trust values associ- 
ated with each signature, which indicate the level of scrutiny in 
the key verification process. The OpenPGP standard ||14| de- 
fines four trust "classes" for signatures, according to the degree 
of verification made. Unfortunately, these classes are univer- 
sally ignored, and most signatures fall into the "generic" class, 
from which no assertion can be made. Since the actual level of 
verification of the keys is in fact unknown, we will investigate 
hypothetical situations which represent different strategies the 
PGP users may use to verify keys. In the last section we have 
shown that the network is composed of different connection 
patterns: community clusters and centralized trust authorities. 
Depending on how these connection patterns are judged more 
trustworthy, the values of transitive trust will be different. Here 
we will consider three possible scenarios: 1. Random distribu- 
tion, 2. Authority-centered trust, and 3. Community-centered 
trust. In all situations we will consider that all signatures have 
the same trust value of c = 1 /2, except for a fraction 7 of edges 
which have absolute trust c = 1, which is selected as follows 
for each situation: 

1 . Random: The 'jE edges are chosen randomly among all 
E edges. 

2. Authority-centered: The ^E edges with the largest 
betweenness [221 be are chosen, which is defined as 



(7y(e) 



(23) 



where ai,j is the number of shortest paths from vertex i 
to i, and (Jij{e) is the number of these paths which con- 
tain the edge e. This distribution favours edges adjacent 
to nodes with high degree, and also edges which bridge 
different communities. 

3. Community -centered: The ^E edges with the largest 
edge clustering are chosen, which is defined as 



(e),i^i,t(e) 



(24) 



t(e) 



where s(e) and t(e) are the source and target vertices of 
edge e, Ai^j is the adjacency matrix, and ji and ki are 
the in- and out-degrees of vertex i, respectively. This 
quantity measures the density of out-neighbours of the 
s(e) which are also in-neighbours of t{e), and simulta- 
neously the density of in-neighbours of t{e) which are 
out-neighbours of s(e). This distribution favours edges 
with belong to densely-connected communities. For in- 
stance, the edges of a clique (i.e. a complete subgraph) 
will all have the value Tg^l — l/(n — 1), where n is 
the size of the clique, which will approach the maximum 
value Te 1 for a sufficiently large clique size. 

In Fig. [8] it is shown the average best trust transitivity, Eq. [T] 
and average pervasive trust Eq. [6] for the PGP network, as a 
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FIG. 8: Average best trust (s) and pervasive trust {t), as a function of 
the fraction of edges with absolute trust 7, for the PGP network. The 
different curves correspond to the different trust distribution scenarios 
described in the text. 



function of 7 according to the different approaches. We note 
that, due to the relatively small size of the network, no discon- 
tinuous transition is seen. The authority-centered trust leads to 
significantly higher values of (s) and {t), and the community- 
based distribution to the lowest values. This is expected, since 
distributing trust according to the edge betweenness essentially 
optimizes trust transitivity, putting the highest values along the 
shortest paths between vertices. The community-centered ap- 
proach does exactly the opposite, favoring intra-community 
connections, and results in the lowest values of average trust. 
Thus, favoring the hubs and authorities is clearly more efficient, 
if the objective is solely to increase the average trust in the net- 
work. However, pure efficiency may not be what is desired, 
since it relies in the opinion of a much smaller set of vertices, 
which eases the job of dishonest parties, which need only to 
convince these vertices in order to be trusted by a large portion 
of the network. Some of these issues become more clear by 
observing how nodes with different degrees receive trust with 
each of these strategies, as show in Fig. [9] For random distri- 
bution of trust, the vertices with higher degree receive a nat- 
ural bias in the values of average best in-trust, (s), since the 
shortest paths leading to them tend to be smaller. But the fair 
nature of the definition of t compensates for this, and the val- 
ues of (t) are almost independent of the in-degree of the ver- 
tices. The highly connected nodes become more trusted only 
with the authority-centred approach. Interestingly, in this sit- 
uation the nodes with the smallest degrees also receive a large 
value of trust, since most of them are "fringe" nodes connected 
only with the hubs (see Fig.|5]l. The vertices with intermediary 
degrees are thus left in the limbo, and are in effect penalized for 
their community pattern. The almost symmetrically opposite 
situation is obtained with the community-centered trust distri- 
bution, where both the vertices with smallest and largest degrees 
receive the smallest trust values, and the intermediary nodes are 
judged more trustworthy due to their strong communities. We 
note that this effect is not due simply to the way the values of 
trust are distributed, but depend strongly on the existence of 
communities in the network. This is evident when the same 
trust distribution is applied to a shuffled version of the network, 
with the same degree sequence, as is shown in Fig.|9] For such 
a network, the community structure disappears, and the highly 
connected nodes come again in the lead. 




FIG. 9: Average best trust (s) and pervasive trust (t), as a function of the in-degree j and the fraction of edges with absolute trust 7, for the PGP 
network. The different plots correspond to the different trust distribution scenarios described in the text: (a) Random distribution, (b) authority- 
centered distribution and (c) community-centered distribution. The plots (d) correspond to a community-centered distribution, done on a shuffled 
version of network, with the same degree sequence. 



V. CONCLUSION 

We investigated properties of trust propagation on network 
based on the notion of trust transitivity. We defined a trust met- 
ric, called pervasive trust which provides inferred trust values 
for pairs of nodes, based on a network of direct trust values. The 
metric extends trust transitivity to the situation where multiple 
paths between source and target exist, by combining the best 
trust transitivity to the in-neighbours of a given target node, and 
their direct trust to the target. The trust values so-obtained are 
unbiased, personalized and well defined for any possible net- 
work topology. Equipped with this metric we analyzed the con- 
ditions necessary for global trust propagation in large systems, 
using random networks with arbitrary degree distributions as a 
simple model. We analytically obtained the average best trust 
transitivity (as well as pervasive trust) as a function of the frac- 
tion 7 of edges with absolute trust c = 1. We found that there is 
a specific value of 7 = 7*, below which the average trust is al- 
ways zero. For 7 > 7* the average value jumps discontinuously 
to a positive value. 

Using the defined trust metric, we investigated trust propa- 
gation in the Pretty Good Privacy (PGP) network ^ |6l . We 
gave an overview of the most important topological and dynam- 
ical features of the PGP network, and identified mixed connec- 
tivity patters which are relevant for trust propagation: namely 
the existence of trust authorities and of densely-connected non- 
centralized communities. Based on these distinct patterns, we 
formulated different scenarios of direct trust distribution, and 
compared the average infeiTed trust which results from them. 
We found that an authority-centered approach, where direct 
trust is given preferentially to nodes which are more central, 
leads to a much larger average trust, but at the same time ben- 



efits nodes at the fringe of the network, which are only con- 
nected to the authority hubs, and for which no other information 
is available. Symmetrically, a community-centered approach, 
where edges belonging to densely-connected communities are 
favoured with more trust, results in less overall trust, but both 
the fringe nodes and the authorities receive significantly less 
trust than average. These differences are not simply due to the 
different ways the direct is distributed, but rather to the fact that 
the dense communities and the trust authorities are somewhat 
segregated. These differences illustrate the advantages and dis- 
advantages of both paradigms of trust propagation, which seem 
to be coexist in the PGP network. It also serves as an insight- 
ful example of how dramatically the direct trust distribution can 
influence the infeiTed trust, even when the underlying topology 
remains the same. 

In this work, we have concentrated on static properties of 
trust propagation. However most trust-based systems are dy- 
namic, and change according to some rules which are influ- 
enced by the trust propagation itself. One particularly good ex- 
ample is market dynamics 11-3] where sellers (or boiTowers) 
do not perform well if they have a poor track record, which will 
be partially influenced by trust. Thus, it remains to be seen how 
trust transitivity can be carried over to such types of models, 
and what role it plays in shaping their dynamics. 
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